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The question of whether DNA conducts electric charges is intriguing to physi- 
cists and biologists alike. The suggestion that electron transfer/transport in 
DNA might be biologically important has triggered a series of experimental 
and theoretical investigations. Here, we review recent theoretical progress by 
concentrating on quantum-chemical, molecular dynamics-based approaches to 
short DNA strands and physics-motivated tight-binding transport studies of long 
or even complete DNA sequences. In both cases, we observe small, but signif- 
icant differences between specific DNA sequences such as periodic repetitions 
and aperiodic sequences of AT bases, A-DNA, centromeric DNA, promoter se- 
quences as well as random- ATGC DNA. (Revision : 1.15) 

I. Introduction 

Charge transfer in DNA is currently the subject of intense theoretical and ex- 
perimental investigations [1-4]. DNA, which is the blueprint of life, is being 
considered as a molecular wire in a new generation of electronic devices and 
computers. However its electronic properties are elusive and remain controver- 
sial. Despite the current debate, the subject is far from new. Soon after Watson 
and Crick discovered the double-helix structure of DNA [5], Eley and Spivey 
were the first to suggest that DNA could serve as an electronic conductor [6]. 
The notion of a molecular wire is thought to apply to the DNA double helix 
because of its tt- electron (the tt- way) system of bases stacked upon each other. 
More recently. Barton and colleagues [7] measured the fluorescence of an ex- 
cited molecule and found that it no longer emitted light when attached to DNA. 
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Their results suggested that this "fluorescence quenching" was due to the charge 
on the excited donor molecule leaking along the length of the DNA to a nearby 
acceptor molecule. 

Other extensive experimental and theoretical work over the past decade has 
led to substantial clarification of charge-transfer mechanisms in DNA [7-20]. 
The dominant mechanisms appear to be both short-range quantum mechanical 
tunneling and long-range thermally activated hopping. Guanine has the high- 
est occupied molecular orbital (HOMO) level of the four bases, and can act as 
a trap for holes. Experiments on repeats of this base are used to investigate 
long range hopping, and models have been developed to clarify the long range 
hopping data in G-repeats [14]. Charge transport in DNA is also made more 
complex because of the influences of the local environment, such as counteri- 
ons, thermal vibrations, contact resistance, and sequence variability, which are 
difficult to control [21-25]. The charge-transfer mechanisms in DNA and/or 
whether DNA is a good conductor or not remains somewhat unsettled. Indeed, 
theory is of great help in understanding these phenomena, but given the compu- 
tational cost of full-scale calculations on the realistic DNA systems, theoretical 
efforts to date have mostly been limited to small- and medium-size model sys- 
tems [26-28], to dry DNA molecules [23,29, 30], or to larger systems using 
model Hamiltonians [31^2] and semi-empirical studies [22,43-53]. 

In this review, we shall first focus on the use of quantum-chemical meth- 
ods which can treat smaller, but atomistically correct segments of DNA in Sec. 
2. After an introduction to the construction of the DNA molecules and the 
density-functional based methods in Sec. 2.1 - 2.3, we then present results, 
many of which are new, in sections 2.4 to 2.5. In the next large section 3, we 
use the lessons learned from the atomistic approach and now study an effective 
and necessarily rather coarse-grained Hamiltonian model of DNA to reveal the 
interplay of sequence fidelity and transport. Again, models, methods and DNA 
sequences are introduced in Sec. 3.1 - 3.3. Sections 3.4 and 3.5 include the 
obtained results. We conclude and summarise in Sec. 4. 

2. Quantum chemical methods for short DNA strands 

Within a density functional based local orbital tight-binding-like formalism, 
more complex problems can be investigated with a modest decrease in the ac- 
curacy. This is particularly useful where a quantum mechanical description 
is important to the investigated system's fundamental chemistry, yet where a 
smaller model system would inadequately describe the proper physical envi- 
ronment. With the increase in computational power, great effort has been made 
by the electronic- structure community to optimise the performance of quantum 



mechanical methods. Calculating larger systems without making stringent ap- 
proximations has only been possible within the past few years. In this chapter, 
we theoretically investigate the electronic states of model DNA structures as the 
molecule undergoes classical thermal motion at room temperature by means of 
marrying classical molecular dynamics simulations with an electronic structure 
density-functional method. We investigate the dynamics of the DNA structure 
and its impact on the electronic structure. A similar approach was recently used 
to postulate the charge migration mechanism in DNA, with injected charges 
being gated in a concerted manner by thermal motions of hydrated counteri- 
ons [17]. Here we study a longer oligonucleotide duplex than previous studies, 
and demonstrate with the complete system that its electronic states dynamically 
localize. The mechanism is an Anderson off-diagonal dynamic disorder model 
similar to the static disorder that leads to localised band-tail states in amorphous 
semiconductors [54-57]. The concept of static Anderson localization in DNA 
has previously been considered by Ladik [58, 59]. We show that localization in 
DNA reaches far deeper in energy than just band tail states. We demonstrate for 
the first time this effect in a hydrated poly(dA)-poly(dT) 10 base-pair fragment; 
this represents one complete turn of the B-DNA double helix. 

2.1. Generating the poly(dA)-poly(dT) DNA structures 

In this chapter, we consider thermal fluctuations of a poly(dA)-poly(dT) DNA 
10-mer duplex fragment at room temperature from classical MD simulations; 
therefore, aperiodic structures of DNA are generated throughout the simulation. 
With our local-orbital density-functional method, we compare the electronic 
states of an idealised model periodic canonical B-DNA poly(dA)-poly(dT) DNA 
structure with those thermally-distorted aperiodic poly(dA)-poly(dT) DNA struc- 
tures generated from the MD simulation. 

Canonical B-DNA 10-base pair models of poly(dA)-poly(dT) were built into 
a Arnott B-DNA [60] model using the nucgen DNA builder contained within 
AMBER 5.0 [61]. Classical molecular dynamics trajectories of the B-DNA 
models, including explicit water and sodium counterions, were generated using 
the CHARMM (version c26nl) [62]. Both models were solvated with enough 
pre-equilibrated TIPSP [62] water to add 12.0 A to the maximal distance ex- 
tent of the DNA. Net-neutralising Na+ ions [63] were placed off the phosphate 
oxygen bisector and then minimised (with larger, 5.0 , van der Waals radii) in- 
vacuo prior to solvating the system. Equilibration involved the application of 
harmonic positional restraints (25.0 kcal/mol^) and 250 steps of ABNR minimi- 
sation, followed by 25 ps of MD where the temperature was ramped up from 
50 to 300 K in 1 ps intervals. The initial equilibration was performed with the 
CorneU et al. force field [64]. Subsequent equilibration with the ??? (BMS) 



force field of Langley [65], involved 250 steps of ABNR minimisation followed 
by 5 ps of MD with position restraints. 

All production simulations were performed without any restraints and the 
BMS force field of Langley. Production simulation was performed for 10 nanosec- 
onds with CHARMM (version c26nl) [62] in a consistent manner. This in- 
volved constant temperature (300 K, mass = 1000) [66] and pressure (1 atm, 
piston mass = 500 amu, relaxation time = 20 ps~^) [67], 2 fs time steps with 
the application of SHAKE [68] on hydrogen atoms, accurate use of the particle 
mesh Ewald method [69] (~1.0 grid size with 6^^ order B-spline interpolation 
and a Ewald coefficient of 0.34) in rhombic dodecahedral unit cells (x = y = z, 
a = 60°, (3 = 90°, 7 = 60°), a heuristically updated atom based pairlist built 
to 12.0 A and cutoff at 10.0 A with a smooth shift of the van der Waals ener- 
gies. These methods have proven reliable for representing DNA duplex struc- 
ture [70,71] and the BMS force field very accurately models B-DNA crystal 
structures [65,72]. 

After an initial equilibration of an explicitly solvated 10-mer B-DNA poly(dA)- 
poly(dT) with explicit Na+ ions, production molecular dynamics simulations 
(applying an accurate particle mesh Ewald treatment of the electrostatics) were 
performed for 10 ns. As shown in Fig. 1, a plot of the all-atom root-mean- 
squared deviation over the entire run is rather stable, and although thermal 
fluctuations are clearly evident, no large scale distortions of the structure were 
observed (beyond sugar repuckering, and expected base and backbone fluctua- 
tions). 

2.2. Electronic structure calculations of molecular dynamics snapshots 

A stable portion of the trajectory from 1.5-2.5 ns, at 0.5 ps intervals, was 
analysed further using the FIREBALL DFT methodology [73]. FIREBALL is a 
first principles tight-binding molecular dynamics (TBMD) simulation technique 
based on a self-consistent version of the Harris-Foulkes [74,75] functional [76]. 
In this method, confined atomic-like orbitals are used as a basis set for the de- 
termination of the occupied eigenvalues and eigenvectors of the one-electron 
Hamiltonian. The "fireball" orbitals, introduced by Sankey and Niklewski [77], 
are obtained by solving the atomic problem with the boundary condition that 
the atomic orbitals vanish outside and at a predetermined radius where wave- 
functions are set to be zero. This boundary condition is equivalent to an "atom 
in the box" and has the effect of raising the electronic energy levels due to con- 
finement. An important advantage of the Sankey and Niklewski basis set is 
that the Hamiltonian and the overlap matrix elements of the system are quite 
sparse for large systems, reducing overall computation time. A summary of 
the method is given in Ref. [73] and references therein. All poly(dA)-poly(dT) 
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Fig. 1. Shown in black and gray are the all-atom best-fit root-mean- squared deviations (in A) 
as a function of time compared to canonical B-DNA (gray) and the straight coordinate average 
structure from the 1.5-2.5 ns portion of the trajectory (at 0.5 ps intervals). 



DNA atoms, including phosphate groups and backbone atoms are included in 
the single-point calculations which contained 10 base pairs (644 atoms). Al- 
though the MD simulations are performed with full hydration and counterions, 
we include only 350 water molecules in our electronic structure calculations; 
this number of molecules represents approximately 2 solvation layers surround- 
ing the molecule. Adding all water and cation atoms to more correctly represent 
the environment surrounding the DNA molecule will be the subject of future 
work. 

2.3. Quantifying the Degree of Localization 

The phenomena of Anderson localization [54, 78] refers to the localization 
of mobile quantum mechanical entities, such as spin or electrons, due to im- 
purities, spin diffusion, or randomness. Anderson localization applied to DNA 
may come from two distinct mechanisms, diagonal or off-diagonal disorder. 
Diagonal disorder induced localization occurs from variations of the sequence 



along the base stack, and off-diagonal disorder occurs by variations either from 
bonding between bases along the stack or from hydrogen bonding variations 
across the double helix. The qualitative physics of localization is described by 
an Anderson model [54], 



where each molecular orbital (MO) i of a base has energy and interacts with 
its nearest neighbour base MO j (i j) with a Hamiltonian hopping interac- 
tion of Uj. The Anderson model of diagonal disorder randomly varies the on- 
site Hamiltonian matrix elements (diagonal) [78] and describes the A-T-G-C 
random sequencing of DNA [31]. 

Here we focus on B-DNA structures of poly(dA)-poly(dT) in which there 
exists only one base pair combination A-T; each strand has only a single type 
of base in its stack. In this system, only off-diagonal disorder [79] may occur. 
The bonds within a single base are strong, but thermal fluctuations coupled with 
weak TT-bonding occurs along the stack and the weak hydrogen bonds across the 
strands of the DNA double helix allows individual bases significant freedom of 
movement, including transient base pair opening and DNA breathing events 
over millisecond time scales [80] and large fluctuations in the structure [81]. 
Stochastic fluctuations of the weak bonding modulates the electronic coupling, 
tij, between adjacent bases. If the dynamic fluctuations of Uj are large enough, 
localised electronic states are produced as in an amorphous solid. 

We quantify the spatial extent of an electronic state by defining the number 
of accessible atoms, W, from the electronic state quantum entropy. From a 
particular state u, the wavefunction 'ip{iy) has a Mulliken population Piiiy) on 
atom i, which loosely is considered the probability that an electron in state v and 
resides on a particular atom i. The populations are normalised, J2iPii^) = 1- 
From probability theory, we define a quantum entropy for state z/ as. 



For example, a state v with equal probabilities over No atoms (Nq < NTotai), 
gives an entropy of In A^o- From Boltzmann's equation, we can determine the 
number of accessible atoms W{i') for electronic state u as S{i') = \nW{i'), or 



Our example state with equal probabilities spread over Nq atoms gives the ex- 
pected result, W{iy) = Nq. For the complex electronic states of DNA, the 
number of accessible atoms W{iy) gives a quantitative, and easily calculable, 
measure for how many atoms a particular electronic state V^(z/) reaches. 
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2.4. Electronic states of a periodic poly(dA)-poly(dT) DNA 

To demonstrate that localization is not due to limitations of using localized or- 
bitals, a 10-base pair periodic structure of poly(dA)-poly(dT) was created based 
on the Arnott B-DNA [60] fiber model. Each base pair is rotated by 36° and 
translated by 3.38 A; therefore, 10 base pairs complete one full pitch of the dou- 
ble helix and periodicity is enforced in the program. The population densities 
for the highest occupied molecular orbital (HOMO) and the lowest unoccupied 
molecular orbital (LUMO) are plotted in Fig. 2. As seen from this figure, both 
the HOMO and LUMO states exhibit very extended and periodic (Bloch-like) 
states throughout the molecule. No localization is evident. 

2.5. Electronic states of sampled poly (dA) -poly (dT) DNA configurations 

We now consider results for a single configuration from the MD simulation 
(labeled step 3001, the first coordinate set 0.5 ps after a 1.5 ns production sim- 
ulation). Figure 3 shows the number of accessible atoms, W{iy), for each elec- 
tronic state at this time step for the dehydrated structure. The W{iy) for the 
hydrated DNA structure is shown in Fig. 4. For both structures it is important 
to note that, near the HOMO and LUMO, the number of accessible atoms is 
quite small (< 30), demonstrating a large degree of localization for the wave- 
functions. This localization extends over several eV and is deeper than just the 
band tail states. States further away from the HOMO and LUMO become con- 
siderably delocalised and the number of accessible atoms is much larger. The 
number of accessible atoms is also small for the lowest energy levels; these 
deep states consist mainly of 2s levels of oxygen and nitrogen atoms. For the 
hydrated DNA molecule, the localized states near the HOMO are mainly due to 
the surrounding water molecules. Just below these water-related localized elec- 
tronic states are the localized electronic states residing on the DNA bases. This 
may account for the smaller band gap of the hydrated structure compared with 
the electronic structure of the dehydrated DNA. Overall, the electronic struc- 
tures for both the hydrated and dehydrated DNA molecules show remarkable 
similarities. These results imply that the aquatic environment does not signifi- 
cantly alter DNAs electronic structure. Therefore, we focus our studies on the 
electronic structures of dehydrated DNA molecules. 

The degree of localization for two example band states (1074 and 614 - larger 
number implies higher eigenvalue) in the dehydrated DNA structures can be 
seen in Fig. 5 where population density plots of a localised and delocalised 
state are shown. As more configurations are analysed, we see consistently that 
the number of accessible atoms for the energy levels near the HOMO primarily 
consist of around 20 atoms. However, as a function of time, different sets of 
atoms are involved. To determine where the localization occurs, we compute 
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Fig. 2. Population densities for the highest occupied molecular orbital (HOMO) and the lowest 
unoccupied molecular orbital (LUMO) are shown for periodic poly(dA)-poly(dT) DNA (10 
base pairs). Both molecular orbitals exhibit very extended and periodic (B loch-like) states 
throughout the molecule. 
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Fig. 3. Number of accessible atoms, W^u), for each electronic state near the HOMO and LUMO 
levels. Inset shows number of accessible atoms for all levels. The system contains 10 basepairs 
of DNA (644 atoms). 



a residence of each state according to the specific DNA component - adenine 
base, thymine base, ribose backbone, or phosphate group and determine where 
the high probabiUty regions are located. Further investigation indicates the res- 
idence locaUzation for the highly localised states near the HOMO are contained 
approximately on single bases in the DNA molecule; adenine for states very 
near the HOMO and thymine for states slightly lower in energy. This regional 
population information for the HOMO on adenine is plotted in Fig. 6. The 
more extended states (~ 8 eV to ~ 18 eV below the HOMO) are found to 
reside throughout the various DNA components. 

As the simulation proceeds in time, the residence of the HOMO level moves 
from base to base along the poly(dA)-poly(dT) system and large jumps in se- 
quence are possible over this 0.5 ps resolution time scale. This fluctuating res- 
idency of the HOMO is visualised in Fig. 7, which shows population density 
plots for a series of snapshots at different times (t=3001, 3004, 3007, and 3010). 
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Fig. 4. Number of accessible atoms, W^u), for each electronic state near the HOMO and LUMO 
levels. Inset shows number of accessible atoms for all levels. The system contains 10 basepairs 
of DNA (644 atoms) and 350 water molecules. 



The separation between these snapshots is 1.5 ps. Figure 8 shows the location of 
the HOMO for all 100 snapshots where the electronic structure was calculated 
in this work. The population is localised on different adenine bases as time 
progresses and appears to chaotically oscillate between one end of the DNA 
molecule to the other. The HOMO level's localization on one adenine base is 
traded for localization on another adenine base through the dynamical simula- 
tion. Physically, this trading ought to reflect concerted fluctuations assignable 
to off-diagonal dynamical disorder in a regular homooligonucleotide duplex. 
Based on these results, it is conceivable that electron (hole) transfer will occur 
as two or more localised MO levels are dynamically trading places. Moreover, 
this swapping may be gated by thermal fluctuations of hydrated counterions, in 
accordance with the ion-gating transport mechanism proposed in Ref. [17]. 

Finally, it is of considerable interest to compare our above results to the 
known literature data on this theme. Specifically, our findings are in parallel 
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Fig. 5. Example of a localised and a delocalised state for two different states in poly(dA)- 
poly(dT) at time step 3001. For reference, the HOMO is band 1094. 





Fig. 6. Residence of state gives the location of the wavefunction for each energy state. States 
very near the HOMO level are located primarily on the adenine bases. For any given state, the 
sum of the four residences add to unity. 



with the most recently estabUshed dependence of electronic coupling between 
DNA bases in the stack on DNA conformational states: a diminuation of the 
coupling between the DNA purine bases due to the pertinent conformational 
changes would 'arrest' the HOMO at one particular base. Whereas, conforma- 
tionally induced increases in the above coupling ought to promote the 'HOMO 
trading' we revealed here. Our results are also in accordance with the analogous 
approach put forth most recently in Ref. [46] and in Ref. [88]. To be capable of 
formulating reasonable suggestions for experimentalists, we would need more 
detailed calculations not only on poly(dG)-poly(dC), but also on DNA with 
mixed base sequences. 





Fig. 7. Population density plots for the localised HOMO state as a function of time. The time 
between snapshots is 1.5 ps. 
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Fig. 8. Location of the HOMO as function of time. The ten bases are the ten adenine bases on 
one strand of the DNA. The HOMO is located only on adenine bases. 



3. Effective tight-binding Hamiltonians for long DNA strands and com- 
plete sequences 

In this section, we focus on whether DNA, when treated as a quantum wire in 
the fully coherent low-temperature regime, is conducting or not. To this end, we 
study and generalise a tight-binding model of DNA which has been shown to re- 
produce experimental [25] as well as ab-initio results [89]. A main feature of the 
model is the presence of sites which represent the sugar-phosphate backbone of 
DNA but along which no electron transport is permissible. We emphasize that 
the models is constructed to take into account the HOMO-LUMO gap observed 
in the DFT-based studies in chapter 2 as well as the observed absence of trans- 
port along the backbone. We measure the effectiveness of the electronic trans- 
port by the localisation length ^, which roughly speaking parametrises whether 
an electron is confined to a certain region ^ of the DNA (insulating behaviour) 
or can proceed across the full length L (< Q of the DNA molecule (metallic 
behaviour). 

3.1. The ladder model 

A convenient tight binding model for DNA can be constructed as follows: 
it has two central conduction channels in which individual sites represent an 
individual base; these are interconnected and further linked to upper and lower 
sites, representing the backbone, but are not interconnected along the backbone. 
Every link between sites implies the presence of a hopping amplitude. The 
Hamiltonian Hl for this ladder-like model is given by 

L 

Hl = {U^r\i,r){i + l,T\+£,^r\i,r){i,r\) 

i=l T=l,2 

+ E it^\^.r){^,qir)\+e|\^,q){^,q\) 

g=T,i 

L 

+ ^ti,2|i,l)(i,2| (2) 

i=l 

where ti^r is the hopping amplitude between sites along each branch r = 1, 2 
and £i^r is the corresponding onsite potential energy, and and £^ give hop- 
ping amplitudes and onsite energies at the backbone sites. Also, g(r) =t, | 
for r = 1,2, respectively. The parameter tu represents the hopping between 
the two central branches, i.e., perpendicular to the direction of conduction. 
Quantum chemical calculations with semi-empirical wave function bases using 
the SPARTAN package [90] results suggest that this value, dominated by the 
wave function overlap across the hydrogen bonds, is weak and so we choose 



ti2 = 1/10.^ As we restrict our attention here to pure DNA, we also set ei^j- = 
for all i and r. Note that in this way, the energy gap has been made to be sym- 
metric about E = 0. Hence when comparing with the results in section 2, a 
constant shift according to the neglected ionisation potentials has to be added. 

The model (2) clearly represents a dramatic simplification of DNA. Neverthe- 
less, in Ref. [25] it had been shown that an even simpler model — in which base- 
pairs are combined into a single site — when applied to an artificial sequence 
of repeated GC base pairs, poly(dG)-poly(dC) DNA, reproduces experimental 
data current- voltage measurements when ti = 0.37eV and f- = 0.74eV are be- 
ing used. This motivates the above parametrisation of t'^ = 2ti and = 1 for 
hopping between like (GC/GC, AT/AT) pairs. Assuming that the wave function 
overlap between consecutive bases along the DNA strand is weaker between 
unlike and non-matching bases (AT/GC, TA/GC, etc.) we thus choose 1/2. 
Furthermore, since the energetic differences in the adiabatic electron affini- 
ties of the bases are small [91], we choose ei = for all i. Due to the non- 
connectedness of the backbone sites along the DNA strands, the model (2) can 
be further simplified to yield a model in which the backbone sites are incorpo- 
rated into the electronic structure of the DNA. The effective ladder model reads 
as 

L 

(if') 



i=l r=l,2 

2 



+ 



ef^ - E 



z, r)(i, t| + h.c. . (3) 



Thus the backbone has been incorporated into an energy -dependent onsite po- 
tential on the main DNA sites. This effect is at the heart of the enhancement of 
localization lengths due to increasing binary backbone disorder reported previ- 
ously [42]. 

3.2. The numerical approach to localisation in a Hamiltonian tight-binding 
model 

There are several approaches suitable for studying the transport properties of 
the model (2) and these can be found in the literature on transport in solid state 
devices, or, perhaps more appropriately, quantum wires. Since the variation in 
the sequence of base pairs precludes a general solution, we will use two methods 
well-known from the theory of disordered systems [78]. 



^Simulations with larger tu ~ 1/2 give qualitatively similar results. 



The first method is the iterative transfer-matrix method (TMM) [92-96] which 
allows us in principle to determine the localisation length ^ of electronic states 
in systems with cross sections M = 1 [25] and 2 (ladder) and length L M, 
where typically a few million sites are needed for L to achieve reasonable ac- 
curacy for ^. However, in the present situation we are interested in finding 
^ also for viral DNA strands of typically only a few ten thousand base-pair 
long sequences. Thus in order to restore the required precision, we have mod- 
ified the conventional TMM and now perform the TMM on a system of fixed 
length Lq. This modification has been previously used [97-99] and may be 
summarised as follows: After the usual forward calculation with a global trans- 
fer matrix T^^, we add a backward calculation with transfer matrix This 
forward-backward-multiplication procedure is repeated K times. The effective 
total number of TMM multiplications is L = 2KLq and the global transfer- 
matrix is tl = {T^Tl^^. It can be diagonalised as for the standard TMM 

with K ^ CO io give r^r^ exp[diag(4A'Lo/^r)] with r = 1 or r = 1,2 
for fishbone and ladder model, respectively. The largest for all r then corre- 
sponds to the localisation lengths of the electron on the DNA strand and will be 
measured in units of the DNA base-pair spacing (0.34 nm). 

The second method that we will use is the recursive Green function approach 
pioneered by MacKinnon [100, 101]. It can be used to calculate the dc and ac 
conductivity tensors and the density of states (DOS) of a cZ-dimensional disor- 
dered system and has been adopted to calculate all kinetic linear-transport co- 
efficients such as thermoelectric power, thermal conductivity, Peltier coefficient 
and Lorentz number [102]. 

The main advantage of both methods is that they work reliably (i) for short 
DNA strands ranging from 13 (DFT studies [103]) base pairs up to 30 base pairs 
length which are being used in the nanoscopic transport measurements [89] as 
well as (ii) for somewhat longer DNA sequences as modelled in the electron 
transfer results and (iii) even for complete DNA sequences which contain, e.g. 
for human chromosomes up to 245 million base pairs [104]. 

3.3. Long DNA sequences: A-DNA, centromers and (super-)promoters 

We shall use 2 naturally occurring long DNA sequences ("strings"), (i) A- 
DNA [105] is DNA from the bacteriophage virus. It has a sequence of 48502 
base pairs and is biologically very well characterised. Its ratio a of like to 
un-like base-pairs is ax = 0.949. (ii) centromeric DNA for chromosome 2 of 
yeast has 813138 base pairs [106] and acentro. = 0.955. This DNA is also rich 
in AT bases and has a high rate of repetitions which should be favourable for 
electronic transport. 

Another class of naturally existing DNA strands is provided by so-called pro- 



moter sequences. We use a collection of 4986 is these which have been as- 
sembled from the TRANSFAC database and cover a range of organisms such 
as mouse, human, fly, and various viruses. Promoter sequences are biologi- 
cally very interesting because they represent those places along a DNA string 
where polymerase enzymes bind and start the copying process that eventually 
leads to synthesis of proteins. On average, these promoters consist of approx- 
imately 17 base-pairs, much too short for a valid localization length analysis 
by TMM. Therefore, we concatenate them into a 86827 base-pair long super- 
promoter with Osuper-p. = 0.921. In order to obtain representative results, 100 
such super-promoters have been constructed, representing different random ar- 
rangements of the promoters, and the results presented later will be averages. 
As usual, averages of ^ are computed by averaging the normally distributed 1/^ 
values. 

Occasionally, we show results for "scrambled" DNA. This is DNA with the 
same number of A, T, C, G bases, but with their order randomised. Clearly, such 
sequences contain the same set of electronic potentials and hopping variations, 
but would perform quite differently in a biological context. A comparison of 
their transport properties with those from the original sequence thus allows to 
measure how important the exact fidelity of a sequence is. On average, we find 
for these sequences a^/s = 0.899, cKcentro./s = 0.9951 and c^super-p./s = 0.901. 

A convenient choice of artificial DNA strand is a simple, 100000 base-pair 
long random sequence of the four bases, random- ATGC DNA, which we con- 
struct with equal probability for all 4 bases (arandom = 0.901). We shall also 
'promote' these random DNA strings by inserting all 4086 promoter sequences 
at random positions in the random- ATGC DNA (ttrandom/p = 0.910). 

3.4. Results for localization lengths 

We have computed the energy dependence of the localization lengths for all 
sequences of section 3.3. In addition, A-DNA, centromeric DNA and the super- 
promoter DNA where also scrambled 100 times and the localization length 
of each resulting sequence measured and the appropriate average constructed. 
Also, we constructed 100 promoted random- ATGC DNA sequences. As shown 
previously [42], the energy dependence of ^ reflects the backbone-induced two- 
band structure. The obtained i{E) values for the lower band are shown in Fig. 
9. In the absence of any onsite-disorder, we find two prominent peaks separated 
by ti,2 and ^ {E) = i{-E). We also see that A-DNA has roughly the same ^{E) 
dependence as random- ATGC-DNA. The super-promoter has larger ^ values 
compared to random-atcg- and A-DNA. Most surprisingly, centromeric DNA 
— the longest investigated DNA sequence — has a much larger localization 
length than all other DNA sequences. The order of like-to-unlike pair-ratios is 
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Fig. 9. Localization lengths ^ versus Fermi energy E for various clean DNA strands. Only every 
10th symbol is shown. Error bars reflect the standard deviation after sampling the different 
sequences for random- ATGC and promoted DNA. The energy is measured in unit of hopping 
energy between like base pairs, i.e., tuke = U = 0.37eV. 



ttcentro. = 0.955 > ax = 0.949 > ttsuper-p. = 0.921 > ^random = 0.901 and 

one might expect that transport is favoured in sequences with large a. From 
Fig. 9, it is clear that this is not the case, A-DNA has the smallest localization 
lengths, but the second largest a. 

In Fig. 10, we add results for scrambled and promoted DNA. We find that pro- 
moting a given DNA sequence leads to small increases in localization length ^ 
for random DNA, whereas scrambling can lead to increase (centrometric and A- 
DNA) as well as decrease (super-promoter). These results suggest that the pro- 
moters have a tendency towards larger localization lengths and thus enhanced 
transport. 

3.5. Promoter sequences and E. coli binding sites 

Let us now turn our attention to the transport properties of individual promot- 
ers rather than the artificially constructed super-promoters. Since their average 
lengths is 17 base-pairs and thus comparable to the localization lengths mea- 




Fig. 10. Localization lengths ^ versus Fermi energy E for various clean DNA (solid symbols 
as in Fig. 9, error bars not shown for clarity), scrambled DNA (DNA/S, (open o, □, o) and 
promoted DNA (DNA/R, open A) strands. Only every 10th (20th) symbol is shown for clean 
(scrambled/promoted) DNA. Error bars reflect the standard deviation after sampling the differ- 
ent sequences for random- ATGC, scrambled and promoted DNA. 



sured in the longer sequences, we can no longer use the TMM, but need to 
employ the RGFM mentioned in Section 3.2. While this method is capable of 
computing all thermoelectric transport coefficients, we shall restrict ourselves 
to presenting results for the conductance here. 

In Fig. 1 1, we show results for averaged conductance in the upper band; both 
arithmetic and typical conductance have been calculated. We first note that the 
double-peak structure of Fig. 9 has vanished and only a single peak remains. 
This is because our results have been computed with perfectly-conducting leads 
attached to both ends of the DNA strands. This is close to the experimental 
situation, but the purely off-diagonal disorder in the DNA model is now masked 
by the ordered leads. Next, we observe that the promoters and their scrambled 
copies have larger conductances than random- and A-promoters. A-promoters 
has been constructed by cutting sequences with the same lengths as the true 
promoters out of A-DNA at randomly selected positions along the DNA. Since 




Energy E [t^^^] 

Fig. 11. Energy dependence of the conductance G for promoters, scrambled promoters (/S), 
random promoters (/R) and A-promoters (/L). Solid lines denote the arithmetic, dashed lines the 
typical average of G. The error bars denote standard deviation obtained from the 4986 different 
promoters considered in each category (original and /S, /R, /L). These are not repeated for the 
typical averages for clarity. 



a for random and A-DNA is different, this allows us to check whether it is the 
order of base pairs or the value of a which dominated the value of G. Since 
^promoter = 0.928 < Q^A = 0.955, but Cpromotcr > it appears that as before 
the transport properties are not simply large if a is large. This suggests that it is 
indeed the fidelity of the sequence which is also important. 

Typical and arithmetic averages share similar characteristics when comparing 
different sequences as shown in Fig. 11. However, the typical values are sys- 
tematically smaller than their arithmetic counterparts. We therefore expect the 
distributions to be highly non-Gaussian and in Fig. 12 we see that this is indeed 
the case. We first note that both the original promoter as well as their scrambled 
version (/S) appear to have a slightly larger weight at G > 0.05 whereas both 
random and A-DNA are peaked at G ~ 0.025. In addition, we find that there 
is a peak in the conductance distribution P{G) at G ~ 0.26. This peak is most 
pronounced for the original promoter and their scrambled cousins, but much 
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Fig. 12. Distribution function P{G) for the conductances averaged over the energy range 
[—5, 5]. Only promoter and E. coli results have been shaded. 

smaller for the artificial random- and A-promoter. 

In Figs. 1 1 and 12, we have also included results for computationally inferred 
802 E. coli bindings sites [107]. Sequence-specific DNA-binding proteins per- 
form a variety of roles in the cell, including transcriptional regulation. Our 
results show that the total conductance of these sequences is smaller than for 
promoters. However, their average length is ~ 25 so that the average conduc- 
tivity is in fact larger when compared to promoters. This might be important in 
a biological context where one could envisage proteins to identify their binding 
sites differences on local conductivities. 



4. Summary 

The results presented in this chapter are preliminary results but indicate a 
marked difference in the nature of the electronic HOMO-LUMO states for the 
periodic and aperiodic structures of duplex DNA. These results indicate that the 
HOMO-LUMO states for the periodic structure are quite extended as would be 
expected for Bloch-like states while the HOMO-LUMO states for the aperiodic 



structure demonstrates more localization. The concept of static localization in 
short DNA has previously been considered by Ladik [58, 59], and our results 
show that such a localization in our structure for aperiodic poly(dA)-poly(dT) 
DNA reaches far deeper in energy than just the band tail states. The localiza- 
tion phenomenon observed in the DNA double helix is the so-called Anderson 
localization which attributes to the off-diagonal disorder. This disorder results 
from dynamical variations in DNA intramolecular interactions and coupling of 
DNA with its environment. Turning our attention to longer DNA sequences, we 
next used this insight by modelling DNA as an off-diagonally disordered Ander- 
son chain. However, in addition and contradistinction to previous studies using 
Anderson-type models, we include the sugar-phosphate backbone explicitly and 
by doing so retain the essential semi-conducting structure as observed in some 
experiments. Our results for the localization lengths suggest extended states 
even in non-periodic DNA up to ~ 20 base-pairs distances. This is roughly 
consistent with the previous results. Next, we study how transport properties 
differ between sequences and find that promoter sequences seem to have a ten- 
dency towards larger localization length, i.e. enhanced transport. This might 
point towards the importance of an electronic mechanism in the initial stages of 
DNA polymerase. Our results warrant further investigation, as the role of the 
dynamical localization and the sequence dependence may very well suggest an 
important mechanism of charge transport along the DNA molecule. 
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